-
Notifications
You must be signed in to change notification settings - Fork 0
Progress Feedback for build command #116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| for context in contexts: | ||
| emitter = ProgressEmitter(progress) | ||
|
|
||
| if not contexts: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this if-block is not needed. It will behave the exact same way without it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll remove.
| @dataclass(frozen=True, slots=True) | ||
| class ProgressEvent: | ||
| kind: ProgressKind | ||
| datasource_id: str | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems datasource_id is not_null here? or am I missing something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, not all events emitted are datasource-scope. Events such as TASK_STARTED and TASK_FINISHED do not use a datasource_id, so it's made optional here. We can implement more versions of ProgressEvents for each ProgressKind, and then each kind of ProgressEvent would have a more defined structure, but it might be overkill at this point.
| _set_datasource_percent(float(pct)) | ||
| return | ||
|
|
||
| root = logging.getLogger() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this logger manipulation is needed? Looks sketchy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This piece of code is doing 1 thing mainly: silencing all logging globally. It is only activated if we are working with an interactive terminal and the rich bar progress shows up.
This was done because "normal" logging causes big disturbances when interactive with rich logging.
There are alternatives to this, such as routing logs via a RichHandler when in interactive mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the new version of the PR, this is made easier, by routing the logs via a RichHandler, and this bottom part of the code looks less sketchy too.
| ) | ||
| ) | ||
| if i % EMIT_EVERY == 0 or i == len(chunks): | ||
| total_units = len(chunks) * 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot grasp the logic with len(chunks) multiplied by 2. Can you explain please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A ok, I see. For every embedding there are 2 steps - get embedding and persist it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the problem would be that plugin execution is actually a single big progress step in the progress bar. Did you skip this part intentionally or leave for later?
I think we need some DatasourceEmitter object which would know everything about datasource progress status. Another problem at the moment is that this logic about total_units is separated between 2 different methods. They need to know about each other and become tightly coupled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, the *2 exists because each chunk will have 2 operations performed for it. Regarding the plugin execution, I decided not to touch the plugins themselves and assume the introspection as an "atomic" process that doesn't get updated.
If we feel like this would be a significant win, we can definitely pass an emitter to the plugin and have the plugin emit events during the normal execution too. Let me know what you think.
| total_units=total_units, | ||
| ) | ||
|
|
||
| emitter.datasource_progress_units( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, this event has been already emitted in line 118. Because if condition contains or i == len(chunks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Initially I didn't have the or i == len(chunks) in the condition, so this was a remnant from then. I'll remove it
This PR adds structured progress feedback to the build command, usable both from the CLI and from SDK callers.
Changes
Progress API
ProgressCallbackand a small set ofProgressEventkinds to report:ProgressEmitterBuild pipeline wiring
ProgressCallback(optional) through the build entrypoints.CLI progress rendering (Rich)
rich_progress()context manager that renders:cliextra with fallback to no-op ifrichisn't installed.Screen.Recording.2026-02-10.at.11.06.41.mov